Mapping the Denver Housing Market

John Francis Ryan (jfryan70@gmail.com)

Part Two: Exploration

First Glances

Now that you have your data ready, you are curious about the neighborhoods. The areas with the highest and lowest median values, as well as the areas with the best and worst value changes since 2010 - they are important to you, naturally!

In order to tap into four-year neighborhood housing value changes, you merged ACS neighborhood data from the 2005-2010 five-year estimates. Because the Census Bureau uses neighborhood samples of residents and households, it discourages any cross-time comparisons shorter than five years at a time. Also, while the Census Bureau changes some of its neighborhoods every decade, the data used here arise from the same maps and thus provide the best apples-to-apples comparison of neighborhoods.

Before going ahead, it is important to prepare the data a little further. For various reasons, nine census tracts do not have a median housing value; thus, we must remove them from our analysis before we begin to show any correlations. So, let’s do this!

#################################
#

setwd('C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data')

load("ACSData9_14.RData")

# Noticing the nine missing values
table(is.na(ACSData9_14$MedHouseVal))
## 
## FALSE  TRUE 
##   578     9
# Removing the missing values
test <- subset(ACSData9_14, is.na(MedHouseVal))
ACSData9_14nona <- subset(ACSData9_14, !is.na(MedHouseVal)) 

#Now no NA's in median housing value
table(is.na(ACSData9_14nona$MedHouseVal)) 
## 
## FALSE 
##   578
library(maptools)
library(rgdal)
library(sp)
library(ggplot2)
library(ggmap)
library(digest)
library(Hmisc)
gpclibPermit()
## [1] FALSE
library(psych)
library(ggthemes)
library(plyr)
library(dplyr)
library(RColorBrewer)
library(colorspace)

Because you are curious about housing values in the neighborhoods, you shrink the data to only the variables of interest and then arrange the data as you wish. Then, for the sake of exploring the margins, you decide to limit the results to the top twenty census tracts. (You also inspect the data frame to make sure it is ready.)

Before digging in deeper, you take a peek at housing values in general.

ACSData9_14nona.mhv <-ACSData9_14nona[c("Tract",
                  "JR_Name",
                  "County",
                  "MedHouseVal",
                  "MedHouseVal_10",
                  "HV_4YrChg")]


class(ACSData9_14nona.mhv)
## [1] "data.frame"
class(ACSData9_14nona.mhv$MedHouseVal)
## [1] "numeric"
summary(ACSData9_14nona$MedHouseVal)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27500  181900  243100  272200  331000 1000000
summary(ACSData9_14nona$MedHouseVal_10)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19200  188700  239300  274800  325000 1000000
summary(ACSData9_14nona$HV_4YrChg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -44.300  -7.200  -1.700  -1.031   3.575 199.300
library(plotly)
library(ggplot2)

qplot(x = MedHouseVal, data = ACSData9_14, binwidth = 10000,
      xlab = 'Median housing value',
      ylab = 'Number of neighborhoods',
      color = MedHouseVal)

ACSData9_14nona$MedHouseVal2 <-ACSData9_14nona$MedHouseVal/1000

t <- ggplot(ACSData9_14nona, aes(x=County, y=MedHouseVal2)) + stat_summary(fun.y="median", geom="bar", fill="darkgreen", color="lightgreen")
t + ylab("Middle value") + xlab("County") +
   labs(title = "Median Housing by County") +
   geom_hline(yintercept=seq(0, 380, by=10), alpha=0.10)

aggdata <-aggregate(ACSData9_14nona$MedHouseVal, by=list(ACSData9_14nona$County), 
   FUN=median, na.rm=TRUE)
aggdata
##     Group.1      x
## 1     Adams 179350
## 2  Arapahoe 221700
## 3    Denver 262700
## 4   Douglas 334400
## 5 Jefferson 257700
summary(ACSData9_14nona$MedHouseVal, na.rm=TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27500  181900  243100  272200  331000 1000000

You notice, first, the fairly bell-shaped distribution to the data. Many neighborhoods have housing values ranging from $150K to $350K. Some neighborhoods south of the city of Denver have much higher values, some higher than the Bureau’s survey can detect.

You also notice the disparity among counties, with Douglas the highest ($334,000) and Adams the lowest ($179,350), with the other three close to the Denver median ($243,100).

Housing Values: The Highs and Lows

So, which areas have the highest median housing values in 2014?

ACSData9_14nona.mhv %>%
  arrange(-MedHouseVal) %>%
  subset(MedHouseVal > 600000, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
##     Tract           JR_Name    County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1   56.36      Greenwood V.  Arapahoe     1000000        1000000       0.0
## 2   67.04   Cherry Hills V.  Arapahoe     1000000        1000000       0.0
## 3   67.05   Cherry Hills V.  Arapahoe     1000000        1000000       0.0
## 4   67.12      Greenwood V.  Arapahoe      920800         922900      -0.2
## 5   43.03           Hilltop    Denver      839000         812700       3.2
## 6   56.12   GV/Little/Engle  Arapahoe      833200         779000       7.0
## 7   40.06          Univ. Pk    Denver      736200         707800       4.0
## 8   32.03      Country Club    Denver      729600         723100       0.9
## 9  141.23   Castle Pines/CR   Douglas      714300         805700     -11.3
## 10  68.57      Greenwood V.  Arapahoe      700400         649200       7.9
## 11 141.16  Lone Tree/Other    Douglas      697200         627500      11.1
## 12 120.34             Other Jefferson      697200         751400      -7.2
## 13  56.22      Littleton/CV  Arapahoe      689900         759800      -9.2
## 14  39.01           Belcaro    Denver      681900         579300      17.7
## 15 141.35 Highlnds Rch/Lo/S   Douglas      642600         649000      -1.0
## 16  98.48         Evergreen Jefferson      621200         707900     -12.2
## 17     38   Cherry Creek D.    Denver      605400         647500      -6.5
## 18  34.02          Wash. Pk    Denver      603500         540500      11.7
## 19  68.08      Cherry Creek  Arapahoe      602900         605500      -0.4
## 20    850      Centennial/?  Arapahoe      600900         596800       0.7

First, you discover a high proportion (45%) of the tracts exists in Arapahoe County. In fact, four of them exist very closely to each other in the Cherry Hills Village and Greenwood Village areas! The top three, in fact, are capped off at $1,000,000 due to the Census Bureau survey question. It seems that what you heard about this part of Denver is, in fact, true.

Second, Denver County itself has a respectable percentage of areas as well (30%). Third, and interestingly, Adams County has no top-twenty neighborhoods.

Which areas have the lowest median value in 2014?

ACSData9_14nona.mhv %>%
  arrange(MedHouseVal) %>%
  subset(MedHouseVal < 108800, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
##    Tract         JR_Name   County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1     81          Aurora    Adams       27500          19200      43.2
## 2  93.16  Federal Hgts/T    Adams       28500          34100     -16.4
## 3  93.21    Federal Hgts    Adams       35400          32200       9.9
## 4  83.09          Aurora    Adams       35900          29900      20.1
## 5    150      N. Wash./W    Adams       38100          55800     -31.7
## 6   93.2  Federal Hgts/W    Adams       46400          71600     -35.2
## 7  91.03        Thornton    Adams       52400          24300     115.6
## 8  93.19    Federal Hgts    Adams       52600          69500     -24.3
## 9  83.08          Aurora    Adams       65000         116800     -44.3
## 10   820          Aurora Arapahoe       69700         110400     -36.9
## 11 93.22        Thornton    Adams       80800          72500      11.4
## 12 70.89         Windsor   Denver       96700         106200      -8.9
## 13 93.18     Thornton/FH    Adams       97900         102600      -4.6
## 14    35 Elaria Swanesea   Denver       98100         152800     -35.8
## 15   871          Aurora Arapahoe      100800         150200     -32.9
## 16 77.04          Aurora Arapahoe      101800         115800     -12.1
## 17   826          Aurora Arapahoe      104400         151400     -31.0
## 18 72.02          Aurora Arapahoe      104600         150600     -30.5
## 19 88.02           Derby    Adams      106300         120200     -11.6
## 20 55.52        Sheridan Arapahoe      108700         113000      -3.8

Adams County dominates this list at 60% of all tracts including the top nine! Why might this be the case, you wonder?

You see some areas in particular: Aurora most frequently, Federal Heights, and Thornton come up quickly. You are curious to know why this is the case but choose to explore later.

You also notice what is absent here: neither Jefferson nor Douglas Counties have a city on this particular list.

Housing Values Four Years Prior

You also want to have a handle on changes in housing values from 2010 to 2014. You re-arrange the data to sort by four-year changes in median housing values.

Which areas grew the most in housing values from 2010 to 2014?

qplot(x = HV_4YrChg, data = ACSData9_14, binwidth = 5,
      xlab = 'Four year change, median housing value',
      ylab = 'Number of neighborhoods')

describe(ACSData9_14$HV_4YrChg)
##   vars   n  mean    sd median trimmed  mad   min   max range skew kurtosis
## 1    1 578 -1.03 14.69   -1.7   -1.68 8.08 -44.3 199.3 243.6 5.43    66.02
##     se
## 1 0.61
summary(ACSData9_14$HV_4YrChg)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## -44.300  -7.200  -1.700  -1.031   3.575 199.300       9
ACSData9_14nona.mhv$MedHouseVal_10_2 <-ACSData9_14nona$MedHouseVal_10/1000

n <-ACSData9_14nona.mhv %>%
  filter(MedHouseVal_10 < 1000000 & HV_4YrChg < 100) %>%
  ggplot(aes(x = MedHouseVal_10_2, y = HV_4YrChg)) +
  geom_point(alpha = 0.7) + stat_smooth(method = "lm")

n + ylab("Four-year change") + xlab("Median housing value") +
   labs(title = "Four-year change, median housing (2010-2014)") +
   geom_hline(yintercept=seq(0, 20, by=10), alpha=0.10)

You notice in the second graph that the middle value of a neighborhood is far from uniform. Some areas experienced high growth rates - one as high as almost 200%, in fact! - likely due to its status as a new development area. You discover, overall, a slight decline (mean of -1.03, median of -1.7)! The dispersion of values - and the generally flat line - in the second graph suggest there is no uniform trend of certain housing brackets weathering the market better than others.

While it is outside the scope of the project to explore this further at this time, it is nonetheless interesting to note.

You are curious, nonetheless, so you make separate lists of highest-growth and lowest-growth neighborhoods. Where are the highest-growth areas?

ACSData9_14nona.mhv %>%
  arrange(-HV_4YrChg) %>%
  subset(HV_4YrChg >= 20.0, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
##     Tract          JR_Name   County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1   97.51          Berkley    Adams      133200          44500     199.3
## 2   91.03         Thornton    Adams       52400          24300     115.6
## 3   27.01     Capitol Hill   Denver      272000         168500      61.4
## 4     872            Other Arapahoe      453700         288100      57.5
## 5      81           Aurora    Adams       27500          19200      43.2
## 6   11.02         Highland   Denver      339700         251600      35.0
## 7   40.04         Smoor Pk   Denver      346800         261300      32.7
## 8     155      Virginia V.   Denver      290600         219400      32.5
## 9       6     Jefferson Pk   Denver      278300         210300      32.3
## 10    869            Other Arapahoe      220100         168200      30.9
## 11  68.58     Greenwood V. Arapahoe      259500         199300      30.2
## 12   5.01       Sloan Lake   Denver      358300         287100      24.8
## 13     21            Baker   Denver      302000         245600      23.0
## 14  70.37          Windsor   Denver      281200         228700      23.0
## 15  29.02       Wash. Pk W   Denver      484800         396200      22.4
## 16  24.02      Five Points   Denver      336400         276200      21.8
## 17   4.01        Sunnyside   Denver      315300         261900      20.4
## 18   3.01         Berkeley   Denver      330500         274900      20.2
## 19  83.09           Aurora    Adams       35900          29900      20.1
## 20 142.04 Other/Roxbrgh Pk  Douglas      282800         235600      20.0

Two areas in Adams County grew very quickly during this period: a tract in Berkley (199%) and one in Thornton (115%). You surmise that these might be newer developments, getting their footing comparatively late. Both areas have lower-than-average values, so this might be the case.

You also notice Denver County tracts dominate this list - with 60% of the highest-growth tracts - while Jefferson County tracts appear nowhere. Is Jefferson a more mature market?

Among areas with the highest growth, there appears to be a healthy spread of housing values. Stated another way, while some of the hottest areas began as low-value areas in 2010 (certain tracts in Berkley, Thornton, Aurora), other tracts with very respectable growth exist with fairly high housing values. The Highland neighborhood in Denver County, for example, started at $251,600 in 2010 but jumped by 35% to $339,700 four years later. The Washington Park West neighborhood had a similar pattern, jumping 22.4% up to $484,800 in 2014.

You explore a little further, by reviewing neighborhoods with the steepest declines. What do you know about these areas?

ACSData9_14nona.mhv %>%
  arrange(HV_4YrChg) %>%
  subset(HV_4YrChg < -22.0, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
##    Tract         JR_Name    County MedHouseVal MedHouseVal_10 HV_4YrChg
## 1  83.08          Aurora     Adams       65000         116800     -44.3
## 2    820          Aurora  Arapahoe       69700         110400     -36.9
## 3     35 Elaria Swanesea    Denver       98100         152800     -35.8
## 4   93.2  Federal Hgts/W     Adams       46400          71600     -35.2
## 5    871          Aurora  Arapahoe      100800         150200     -32.9
## 6    150      N. Wash./W     Adams       38100          55800     -31.7
## 7    826          Aurora  Arapahoe      104400         151400     -31.0
## 8  72.02          Aurora  Arapahoe      104600         150600     -30.5
## 9     15      Globeville    Denver      116500         164200     -29.0
## 10 55.51      Sheridan/E  Arapahoe      123000         170700     -27.9
## 11 67.01        Smoor Pk    Denver      421600         575700     -26.8
## 12 83.05       Montbello    Denver      111100         151500     -26.7
## 13 49.51        Glendale  Arapahoe      118900         159100     -25.3
## 14 87.05   Commerce City     Adams      123700         164200     -24.7
## 15 32.02     Cheesman Pk    Denver      270300         357600     -24.4
## 16 93.19    Federal Hgts     Adams       52600          69500     -24.3
## 17 45.06        Westwood    Denver      115600         151000     -23.4
## 18   865          Aurora  Arapahoe      303400         393000     -22.8
## 19 87.06   Commerce City     Adams      117600         152000     -22.6
## 20   158        Lakewood Jefferson      215400         277600     -22.4

Three counties (Adams, Arapahoe, Denver) comprise almost equally the vast majority of such areas (95%), with one in Jefferson. You also discover this leans heavily - 80 percent - toward houses in the lower quarter of the value range (below $188,900), with most (15%) in the higher quarter (above $325,000).

You are also curious to see how the higher-value areas have performed since 2010, so you choose areas at the top tenth minimum ($444,000) and drop the highest areas ($1,000,000) for the graph because you do not have a reliable metric for four-year change (as noted above).

ACSData9_14nona.mhv %>%
  arrange(-MedHouseVal_10) %>%
  subset(MedHouseVal_10 >= 444000 & MedHouseVal_10 < 1000000, select = c(Tract, JR_Name, County, MedHouseVal, MedHouseVal_10, HV_4YrChg))
##     Tract           JR_Name    County MedHouseVal MedHouseVal_10 HV_4YrChg
## 4   67.12      Greenwood V.  Arapahoe      920800         922900      -0.2
## 5   43.03           Hilltop    Denver      839000         812700       3.2
## 6  141.23   Castle Pines/CR   Douglas      714300         805700     -11.3
## 7   56.12   GV/Little/Engle  Arapahoe      833200         779000       7.0
## 8   56.22      Littleton/CV  Arapahoe      689900         759800      -9.2
## 9  120.34             Other Jefferson      697200         751400      -7.2
## 10  32.03      Country Club    Denver      729600         723100       0.9
## 11  98.48         Evergreen Jefferson      621200         707900     -12.2
## 12  40.06          Univ. Pk    Denver      736200         707800       4.0
## 13  68.57      Greenwood V.  Arapahoe      700400         649200       7.9
## 14 141.35 Highlnds Rch/Lo/S   Douglas      642600         649000      -1.0
## 15     38   Cherry Creek D.    Denver      605400         647500      -6.5
## 16 141.16  Lone Tree/Other    Douglas      697200         627500      11.1
## 17 141.22  Castle Pines/CPN   Douglas      589600         616600      -4.4
## 18 146.02   Other/Franktown   Douglas      564000         609900      -7.5
## 19 140.13  Castle Rock/TP/P   Douglas      507300         608000     -16.6
## 20  68.08      Cherry Creek  Arapahoe      602900         605500      -0.4
## 21 144.03     Castle Rock/L   Douglas      533500         601500     -11.3
## 22 141.25     Lone Tree/CPN   Douglas      527500         598800     -11.9
## 23    850      Centennial/?  Arapahoe      600900         596800       0.7
## 24  98.37      Arvada/Other Jefferson      572900         593600      -3.5
## 25    864      Centennial/A  Arapahoe      580000         584100      -0.7
## 26  98.45         Genesee/I Jefferson      592500         580200       2.1
## 27  39.01           Belcaro    Denver      681900         579300      17.7
## 28  67.01          Smoor Pk    Denver      421600         575700     -26.8
## 29   98.5      Other/Golden Jefferson      559400         570800      -2.0
## 30 141.31    Highlnds Rch/L   Douglas      522800         570700      -8.4
## 31  56.29        Centennial  Arapahoe      597100         552900       8.0
## 32  34.02          Wash. Pk    Denver      603500         540500      11.7
## 33 145.06       Castle Rock   Douglas      430500         529900     -18.8
## 34  19.02           Auraria    Denver      450000         513900     -12.4
## 35  44.05       Lowry Field    Denver      454800         511500     -11.1
## 36  98.42          Golden/P Jefferson      437000         500300     -12.7
## 37  98.46       Kittredge/E Jefferson      521400         499100       4.5
## 38 120.35             Other Jefferson      499800         495700       0.8
## 39  40.02         Wellshire    Denver      503400         487900       3.2
## 40  17.01  Union Stn (LoDo)    Denver      472000         483800      -2.4
## 41  34.01          Wash. Pk    Denver      475300         480200      -1.0
## 42     33       Congress Pk    Denver      496200         479600       3.5
## 43    853            Aurora  Arapahoe      439800         474300      -7.3
## 44  43.06        Hilltop/LF    Denver      461700         474200      -2.6
## 45 139.09         Franktown   Douglas      442000         472700      -6.5
## 46  85.41       Thornton/TC     Adams      432700         472400      -8.4
## 47  120.5     Other/Bow Mar Jefferson      450300         468700      -3.9
## 48  56.21      Columbine/Li  Arapahoe      439400         464900      -5.5
## 49    867      Aurora/Other  Arapahoe      425100         459600      -7.5
## 50  41.06         Stapleton    Denver      457800         458600      -0.2
## 51 141.27   Castle Pines N.   Douglas      441200         458600      -3.8
## 52 142.02  Other/Roxbrgh Pk   Douglas      462300         454300       1.8
## 53 141.32      Highlnds Rch   Douglas      447400         453700      -1.4
## 54  98.47         Evergreen Jefferson      478600         450200       6.3
## 55 120.36         Littleton Jefferson      449600         447300       0.5
## 56    851             Other  Arapahoe      405500         446900      -9.3
## 57  68.55    Centennial/CCD  Arapahoe      450100         445800       1.0
## 58 139.01      Aurora/Other   Douglas      390800         444000     -12.0
o <-ACSData9_14nona.mhv %>%
  filter(MedHouseVal_10 >= 444000 & MedHouseVal_10 < 1000000) %>%
  ggplot(aes(x = MedHouseVal_10_2, y = HV_4YrChg)) +
  geom_point(alpha = 0.7) + stat_smooth(method = "lm")

o + ylab("Four-year change") + xlab("Median housing value, 2010") +
   labs(title = "Four-year change, median housing (2010-2014)") +
   geom_hline(yintercept=seq(0, 20, by=10), alpha=0.10)

Douglas and Denver counties hold a majority of the high-value neighborhoods (54%), with Arapahoe and Jefferson next (23% and 20%, respectively). Adams County only has one area, in Thornton/Todd Creek ($472,400 in 2010).

More importantly for you, you notice the majority of such neighborhoods experiencing some decline in value (65%).

Part Three: What Explains Housing Values?

First Steps

In order to understand better any potential explanations for housing values, you provide point graphs along with estimated regression lines (to highlight the best estimated pattern in the data).

For the sake of brevity, you choose to select a subset of interesting variables and display them in a series of correlations and some interesting graphs. In this way, you can begin to understand what factors appear to affect housing values the most.

You also choose to show multiple relationships together, graphically, by using a correlogram. With a correlogram here, corrgram you can display a multivariate chart of correlations, showing not only the direction but the magnitude as well!

(There is one quirk in using correlograms within R: it does not handle missing values well. In order to solve this problem, you whittle down the dataset by dropping missing values for the variables of interest, median housing values.)

What do you discover?

ACSData9_14nona2 <-ACSData9_14nona[c("MedHouseVal",
                              "Perc_18Plus",
                              "Perc_65Plus",
                              "MedAge",
                              "Perc_Under18",
                              "HHInc_Perc35to50K",
                              "HHInc_Perc35to50K_Owner",
                              "HHInc_Perc50to75K",
                              "HHInc_Perc50to75K_Owner",
                              "HHInc_Perc75to100K",
                              "HHInc_Perc75to100K_Owner",
                              "HHInc_PercUnder35K",
                              "HHInc_Own_PercUnder35K",
                              "HHInc_Perc35Kto100K",
                              "HHInc_Own_Perc35Kto100K",
                              "HHInc_PercOver100K",
                              "HHInc_Own_PercOver100K",
                              "HHs_Perc_MarriedCplFam",
                              "MedianAge",
                              "PercNotHSGrad",
                              "PercGradorProfDegree",
                              "IndInc_Perc25to35K",
                              "IndInc_Perc35to50K",
                              "IndInc_Perc50to65K",
                              "IndInc_Perc75KPlus",
                              "IndInc_Median",
                              "PercBAorMore",
                              "IndInc_PercUpto25K",
                              "IndInc_Perc25Kto75K",
                              "IndInc_Perc75KPlus",
                              "PercUnder150PercPov",
                              "Perc_Rented",
                              "MeanHrsWkd",
                              "SchlDistRnkLYr")]


library(corrgram)
corrgram(ACSData9_14nona2, order=NULL, 
         lower.panel=panel.shade,
         upper.panel=NULL,
         text.panel=panel.txt,
         main="Correlogram of data")

Income and Housing

Is it true that income can help drive neighborhood housing? You are curious to focus on housing owners rather than just the neighborhood at large, figuring this to be a better representation of an individual’s choice to purchase.

p1 <- ggplot(aes(x = HHInc_Own_PercUnder35K, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
           labs(x="Percent of Owners with Income below $35K", y="Median House Value")
p1 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_PercUnder35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5633437
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_PercUnder35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5181778

You begin to notice a fairly modest negative relationship between the percent of owners earning below $35K and the neighborhood’s median house value (r = -0.469). You might expect this, given that owners with such income are unlikely to afford a high-value house.

What about ‘middle-class’ incomes and housing values, however?

p2 <-ggplot(aes(x = HHInc_Own_Perc35Kto100K, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of Owners with Income from $35K to $100K", y="Median House Value")
p2 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35Kto100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.6376349
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_Perc35Kto100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.7471155
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35to50K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5674761
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc35to50K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5849582
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc50to75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5021643
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc50to75K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5906898
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc75to100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.05733788
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Perc75to100K_Owner, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.2464598

As you search even further, you notice a strong - but negative - relationship between mid-range incomes and median house values. This holds true when considering household incomes from everyone or simply ownders (r = -0.638 and -0.747, respectively). This remains true when drilling down to the $35 to $50K range as well as the $50K to $75K range (r from -.50 to -.58) but much less so for the $75K to $100K range (r from -0.05 to -0.24).

If it is true that a mortgage should not reach much further beyond 35% of one’s gross income, let’s take an example. Someone with an annual income of $70K can probably afford a $24K annual mortgage, which can translate to roughly $480K over 20 years (not adjusted for inflation). Thus, one can stretch far but only so far into the housing market.

You are curious and want to go further up the income ladder. Is there a relationship between higher-income owners and the houses they choose?

p3 <-ggplot(aes(x = HHInc_Own_PercOver100K, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of Owners with Income at Least $100K", y="Median House Value")
p3 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_PercOver100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7624962
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHInc_Own_PercOver100K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.8292535

Household income levels can be important at the higher levels. There is a strong positive relationship between a neighborhood’s individual income and housing values, as well as house-owner households earning at least $100K (r = .718 and .829, respectively). If the results are to be believed, high owner-income neighborhoods tend to have higher housing values.

You also want to explore whether individual - rather than household - income helps explain housing values. You investigate median neighborhood incomes as well as income brackets.

p33 <-ggplot(aes(x = IndInc_Median, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of Population with Individual Income from $25K to $75K", y="Median House Value")
p33 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Median, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.718012
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc25to35K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5122076
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc35to50K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3774294
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc50to65K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.05571146
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc65to75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2038175
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc75KPlus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.8224486
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_PercUpto25K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5401895
cor(ACSData9_14$MedHouseVal, ACSData9_14$IndInc_Perc25Kto75K, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3553317

There is a strong positive relationship between a neighborhood’s median individual income and housing values (r = .718). As you see above, the relationship varies when drilling down to income categories. The relationship is strongest when considering high-income saturated areas (percentage earning $75K or more; r = .822) and lowest when exploring either the percent $50 to $65K or $65K to $75K (r = .056 and .204, respectively).

Families and Housing Values

You wonder whether the overall neighborhood’s demographics have an impact on local housing. It sounds right, no? One possibility is that stable families help undergird neighborhoods, which can thus help maintain or improve housing values. You also explore whether the size of a family can affect housing.

p4 <-ggplot(aes(x = HHs_Perc_MarriedCplFam, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of Households as Married-Couple Families", y="Median House Value")
p4 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_Perc_MarriedCplFam, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4211943
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_AvgFamilySize, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3546878

You begin to consider this, as there appears a somewhat strong positive connection between the percentage of married-couple families in an area and the median house value (r = .4212). You suspect part of this can be due to potential unmarried owners, who are not represented here. Perhaps there is a sufficient percentage of umarried houseowners to affect the results.

You find, surprisingly, an inverse relationship between family size and housing values (r = -0.3603). Of course, the results are at the neighborhood (not individual) value, which might account for this.

Adult Education and Housing Values

You start to consider the correlation between adult education and housing. It is easy to assume such a link, so you explore this in the data. You also run correlations on education groups.

p6 <-ggplot(aes(x = PercBAorMore, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of People with at Least Bachelor's Degree", y="Median House Value")
p6 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$PercBAorMore, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7704379
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercNotHSGrad, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5348605
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercHSGradOnly, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.6904984
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercSomeCollegetoAA, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3964668
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercHSGradorMore, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.5348381
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercBADegreeOnly, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.6720215
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercGradorProfDegree, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.7952133

As discovered, the education level of adults in a neighborhood matters. The strongest findings link college success with housing: neighborhoods with a greater concentration of college graduates (and higher) tend to have significantly more expensive housing areas (r = .77), something even higher for graduate/professional degrees (r = .795). Conversely, areas with higher rates of high school dropouts tend to have significantly lower housing values (r = -.534). In fact, in any category exploring the percentage of people who do NOT complete (at least) a college degree, the relationship is negative. It seems the average earning power of a college degree is, in fact, important!

Neighborhood Poverty and Housing Values

Is it true that poverty might be driving housing values? You investigate whether poverty and near-poverty rates are influential.

p7 <-ggplot(aes(x = PercUnder150PercPov, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of People Living Below 150 Percent of Poverty Level", y="Median House Value")
p7 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$PercUnder150PercPov, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.5517491
cor(ACSData9_14$MedHouseVal, ACSData9_14$PercUnderPov, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.4695731

You notice a link between poverty rates and housing values. Areas that tend to have slightly-above-poverty - rates tend to have lower housing values (r = -.552), and a little less so for poverty rates (r = -.470).

House Characteristics and Housing Values

Perhaps the ‘age’ of the houses can have an impact on housing values. You wonder, that is, whether people tend to value ‘older’ houses less than their newer - and presumably sturdier - counterparts. Is it true that, as houses age, they decrease in relative value? Is it also true that neighborhoods with ‘larger’ houses (multi-unit structures) tend to have higher-value houses?

p8 <-ggplot(aes(x = MedianHouseYr, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Median Year House Built", y="Median House Value")
p8 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$MedianHouseYr, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2378942
cor(ACSData9_14$MedHouseVal, ACSData9_14$HHs_Total_UnitsinStructure_PercTwoPlusUnitStructure, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.191573

Do newer houses tend to receive higher appreciation values? The results here suggest otherwise, as there does not appear to be a strong pattern in the data (r = .238). Of course, it can be the case that owners can add material value to their older houses (e.g. house additions, a pool, a larger garage).

You discover little link between the percentage of multi-unit houses and housing values (r = -0.1916).

Renting and Vacancies

You are also curious to see whether renting or owning has any impact on housing values, and whether vacancies tend to depress housing values.

p10 <-ggplot(aes(x = Perc_Rented, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Percent of Units Rented [not Owned]", y="Median House Value")
p10 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_Rented, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3403892
cor(ACSData9_14$MedHouseVal, ACSData9_14$Housing_Perc_Vacant, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.04364004

The results appear to be modest here, whether the percentage of renting or vacancies (r = -.340 and -0.0436, respectively).

Working and Housing

You want to see whether the Great Recession had any long-lasting impact on neighborhood housing. You consider whether the average number of hours - as a proxy for employment status - is linked to housing values.

p11 <-ggplot(aes(x = MeanHrsWkd, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Person's Average Usual Hours Worked", y="Median House Value")
p11 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$MeanHrsWkd, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.2632697

The results suggest otherwise, with a small correlation at best (r = 0.263). Of course, it can be partly a function of the survey, since the types (and payscales) of work people pursue can vary greatly. Alternately, it can also be considered a crude measure of the economic ‘health’ of a community.

School Quality and Housing Values

Last, with all the talk about child education driving people’s housing decisions, you want to see whether this holds any water.

p12 <-ggplot(aes(x = SchlDistRnkLYr, y = MedHouseVal), data = ACSData9_14) +
            geom_point(alpha = .4) + 
            labs(x="School District Ranking, Prior Year", y="Median House Value")
p13 <- p12 + stat_smooth(method = "lm", alpha = .2, size = 0.5)
p13 + geom_jitter(width=.08)

cor(ACSData9_14$MedHouseVal, ACSData9_14$SchlDistRnkLYr, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3654633
cor(ACSData9_14$MedHouseVal, ACSData9_14$SchlDistRnk2YrAgo, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3490175

What you find is a relatively modest correlation between a school district’s performance and housing values at best(r = .3655). This is a bit surprising, given all the talk about schools and educational performance. Of course, you are not drilling down into individual schools and you are not tracking individual moves within the metropolitan area. Perhaps there is a stronger link to individual schools instead of the district. You decide that will be for a different day.

While it is possible to point to the importance of private schools, you know that the national average of private school attendance is only around 10 percent and Denver is not among the top 10 percent of large metropolitan areas for private school attendance. (Source)

Population Age and Housing Values

p14 <-ggplot(aes(x = MedAge, 
           y = MedHouseVal), data = ACSData9_14) +
           geom_point(alpha = .4) +
          labs(x="Neighborhood Median Age", y="Median House Value")
p14 + stat_smooth(method = "lm", alpha = .2, size = 0.5)

cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_18Plus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.1417501
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_65Plus, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.1692704
cor(ACSData9_14$MedHouseVal, ACSData9_14$MedAge, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4616155
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_Under18, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.1417501
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_20to29, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.3049146
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_30to39, use = "pairwise.complete.obs", method = c("pearson"))
## [1] -0.2990187
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_40to49, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3050462
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_50to59, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.3896994
cor(ACSData9_14$MedHouseVal, ACSData9_14$Perc_60to69, use = "pairwise.complete.obs", method = c("pearson"))
## [1] 0.4083626

Most relationships are modest, the strongest being median age (r = .4616). Some are negative, such as the percentage of people 20 to 29 or 30 to 39 (r = -0.3049 and -.2990, respectively). Perhaps it is true that not many people in their twenties purchase a house quickly, but it does not explain people in their thirties.

Mapping Relationships Visually

Preparing the Data

Okay, so perhaps you - like many others - hate staring at graphs. It will make more sense to create visual maps instead. In this way, you can see how neighborhoods can vary - sometimes drastically! After all, visual maps are a powerful way to tell a data story, no?

Since you have been thinking about the downtown and immediate southern suburbs first, you have decided to narrow the maps thus. A more interactive version of various variables is available separately as a Shiny app.

Creating these types of maps takes two basic steps. First, you will use a simple yet accurate starting map to represent the Denver metropolitan area. For this stage of the project, Google’s maps are sufficient.

Second, you collect some geographic files to represent the boundaries of each census tract. The Census Bureau provides shapefiles for just such an occasion, providing polygon shapes.

Third, in order to represent the variables of interest, you merge the shapefiles into your dataset.

There are a few necessary technical steps in order to use the shapefiles. You must read the GEOID variable as a character vector, and then fortify them.

#  Start mapping
#  Define map source type and color


DenverMetro <- c(-106, 40.3, -104, 39.1)
DenverMetroMap <- get_map(location=DenverMetro, source = "google", 
                          maptype = "roadmap", zoom = 11, crop=FALSE)


# Add polygons from tract file
# https://www.census.gov/geo/maps-data/data/cbf/cbf_tracts.html
setwd('C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data')
getwd()
## [1] "C:/Users/John/Denver_Housing_Project/ACS_Data/Final_Data"
tract <- readOGR(dsn=".", layer = "cb_2014_08_tract_500k")
## OGR data source with driver: ESRI Shapefile 
## Source: ".", layer: "cb_2014_08_tract_500k"
## with 1249 features
## It has 9 fields
tract@data$GEOID<-as.character(tract@data$GEOID)


# convert polygons to data.frame
Denver_tract<-fortify(tract, region = "GEOID") 
str(Denver_tract)
## 'data.frame':    85865 obs. of  7 variables:
##  $ long : num  -105 -105 -105 -105 -105 ...
##  $ lat  : num  39.7 39.7 39.7 39.7 39.7 ...
##  $ order: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ piece: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
##  $ id   : chr  "08001007801" "08001007801" "08001007801" "08001007801" ...
##  $ group: Factor w/ 1276 levels "08001007801.1",..: 1 1 1 1 1 1 1 1 2 2 ...
Denver_tract$id <-substring(Denver_tract$id, 2) # id had extra 0 to left

str(Denver_tract$id)
##  chr [1:85865] "8001007801" "8001007801" "8001007801" ...
library(reshape) 
Denver_tract <-rename(Denver_tract, c('id'='id2'))



##########################################
#
# Join polygons to housing value data
#

Denver_tract$id <-as.character(Denver_tract$id)
ACSData9_14nona$id <-as.character(ACSData9_14nona$id)


ACSData10_14 <-left_join(Denver_tract, ACSData9_14nona, by=c('id')) 

save(ACSData10_14, file = "ACSData10.RData")


###########################
# 
# Plot median housing value
# 

describe(ACSData9_14$MedHouseVal, na.rm = TRUE)
##   vars   n     mean       sd median  trimmed      mad   min   max  range
## 1    1 578 272222.5 137976.3 243100 254655.6 109564.1 27500 1e+06 972500
##   skew kurtosis      se
## 1 1.83     5.53 5739.06
quantile(ACSData9_14$MedHouseVal, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##    87.5%      75%    62.5%      50%    37.5%      25%    12.5% 
## 396812.5 330950.0 281012.5 243100.0 215150.0 181900.0 149325.0
# Plot median housing values

ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = MedHouseVal),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .3968125, .330950, .2810125, .243100, .215150, .181900, .149325, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Median House Value by Census Tract, 2014')

#########################
#
# Plot Low-income households
#

describe(ACSData9_14$HHInc_Own_PercUnder35K, na.rm = TRUE)
##   vars   n  mean    sd median trimmed  mad min max range skew kurtosis
## 1    1 580 17.22 10.79  14.65      16 9.41   0 100   100  1.7     6.64
##     se
## 1 0.45
quantile(ACSData9_14$HHInc_Own_PercUnder35K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##   87.5%     75%   62.5%     50%   37.5%     25%   12.5% 
## 29.6625 22.2250 18.5000 14.6500 11.9125  9.3000  6.8375
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_PercUnder35K), data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .184, .14, .106, .0895, .071, .05, .031, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income Below $35K')

##########################################
# 
# Plot middle-income values
#

describe(ACSData9_14$HHInc_Own_Perc35Kto100K, na.rm = TRUE)
##   vars   n mean    sd median trimmed   mad min  max range  skew kurtosis
## 1    1 580 45.5 13.56  46.65   45.98 14.08   0 79.8  79.8 -0.37        0
##     se
## 1 0.56
quantile(ACSData9_14$HHInc_Own_Perc35Kto100K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##  87.5%    75%  62.5%    50%  37.5%    25%  12.5% 
## 60.400 55.100 51.000 46.650 42.600 36.200 28.875
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_Perc35Kto100K), data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .548, .4805, .41675, .355, .313, .255, .189375, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income from $35K to $100K')

##########################################
# 
# Plot upper-income values
#

describe(ACSData9_14$HHInc_Own_PercOver100K, na.rm = TRUE)
##   vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## 1    1 580 37.28 19.15   36.1    36.7 21.65   0 100   100 0.25     -0.7
##     se
## 1 0.79
quantile(ACSData9_14$HHInc_Own_PercOver100K, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##   87.5%     75%   62.5%     50%   37.5%     25%   12.5% 
## 61.4875 51.5000 44.4750 36.1000 28.7125 21.6000 14.2125
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = HHInc_Own_PercOver100K),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .748, .68, .607875, .545, .47025, .372, .289375, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Owners with Household Income Above $100K')

##########################################
# 
# Plot married-couple values
#

describe(ACSData9_14$HHs_Perc_MarriedCplFam, na.rm = TRUE)
##   vars   n  mean    sd median trimmed   mad min  max range skew kurtosis
## 1    1 583 48.75 18.15   47.4   48.73 19.72 5.6 90.3  84.7 0.04    -0.68
##     se
## 1 0.75
quantile(ACSData9_14$HHs_Perc_MarriedCplFam, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##  87.5%    75%  62.5%    50%  37.5%    25%  12.5% 
## 71.750 62.650 54.550 47.400 42.025 35.500 27.800
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = HHs_Perc_MarriedCplFam),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(100, .7175, .6265, .5455, .474, .42025, .355, .278, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Married Couple Families, 2014')

##########################################
# 
# Plot percent with college degree values
#

describe(ACSData9_14$PercBAorMore, na.rm = TRUE)
##   vars   n  mean    sd median trimmed  mad min  max range skew kurtosis
## 1    1 584 39.48 19.84   39.4   39.31 23.8 2.8 86.5  83.7 0.07    -0.97
##     se
## 1 0.82
quantile(ACSData9_14$PercBAorMore, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##   87.5%     75%   62.5%     50%   37.5%     25%   12.5% 
## 63.5125 55.2000 47.5000 39.4000 31.3625 23.1750 13.5000
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = PercBAorMore),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .635125, .552, .475, .394, .313625, .23175, .135, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent of People with A College Degree or More')

##########################################
# 
# Plot percent of people below 150 percent poverty level values
#

describe(ACSData9_14$PercUnder150PercPov, na.rm = TRUE)
##   vars   n  mean    sd median trimmed   mad min  max range skew kurtosis
## 1    1 583 20.22 15.04   15.7    18.5 13.94 0.3 92.4  92.1 0.99     0.66
##     se
## 1 0.62
quantile(ACSData9_14$PercUnder150PercPov, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##  87.5%    75%  62.5%    50%  37.5%    25%  12.5% 
## 39.800 30.350 21.700 15.700 11.225  8.200  5.200
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = PercUnder150PercPov),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .398, .3035, .217, .157, .11225, .082, .052, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Living Below 150 Perent Poverty Level')

####################################
# 
# Plot percent vacant data
#

describe(ACSData9_14$Housing_Perc_Vacant, na.rm = TRUE)
##   vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
## 1    1 583 5.28 3.91    4.8    4.94 3.56   0  33    33 1.64     6.89 0.16
quantile(ACSData9_14$Housing_Perc_Vacant, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5%   75% 62.5%   50% 37.5%   25% 12.5% 
## 9.425 7.300 6.000 4.800 3.600 2.550 1.200
# Plot percent vacant values


ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = Housing_Perc_Vacant),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .09425, .073, .06, .048, .036, .0255, .012, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Housing Units Vacant by Census Tract, 2014')

####################################
# 
# Plot renter data


describe(ACSData9_14$Perc_Rented, na.rm = TRUE)
##   vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## 1    1 583 35.46 23.71  31.19   33.54 26.71   0 100   100 0.58     -0.5
##     se
## 1 0.98
quantile(ACSData9_14$Perc_Rented, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
##     87.5%       75%     62.5%       50%     37.5%       25%     12.5% 
## 65.697053 51.911798 41.133582 31.193581 23.761539 15.011655  7.952541
# Plot percent renter built values


ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = Perc_Rented),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .65697053, .51911798, .41133582, .31193581, .23761539, .15011655, .07952541, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('Percent Housing Units Rented by Census Tract, 2014')

####################################
#
# Plot school district rank data

describe(ACSData9_14$SchlDistRnkLYr, na.rm = TRUE)
##   vars   n mean   sd median trimmed mad  min  max range  skew kurtosis
## 1    1 587 0.48 0.21   0.56    0.48 0.3 0.09 0.77  0.68 -0.07     -1.4
##     se
## 1 0.01
quantile(ACSData9_14$SchlDistRnkLYr, c(.875, .75, .625, .5, .375, .25, .125), na.rm = TRUE)
## 87.5%   75% 62.5%   50% 37.5%   25% 12.5% 
## 0.763 0.682 0.590 0.558 0.294 0.294 0.205
ggmap(DenverMetroMap) + 
  geom_polygon(aes(x = long, y = lat, group=id, fill = SchlDistRnkLYr),
             data = ACSData10_14, color="black", alpha = .4, size = .2) +
            scale_fill_gradientn(colours = c("red4", "red2", "red", "lightcoral", "lightblue3", "cyan4", "blue", "midnightblue"),
                       values = c(1, .763, .682, .59, .59, .558, .426, .294, 0)) +
  labs(x = 'Longitude', y = 'Latitude') + ggtitle('School District State Rank by Census Tract, 2013')